Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dtype-i128 feature flag and Int128Type #6374

Merged
merged 6 commits into from
Feb 2, 2023

Conversation

plaflamme
Copy link
Contributor

This adds a new datatype: Int128Type which is the primitive type used by arrow to represent Decimal types. This new type is behind a dtype-i128 feature flag.

Introducing this type requires abstracting over the arithmetic used for PrimitiveArray because decimal arithmetic must keep track of changes to precision / scale. This PR adds a new trait ArrayArithmetics which delegates to basic for all primitive types except i128 and decimal for i128. There's probably a better approach, but this is the extent of what my Rust-foo is capable of delivering.

There are a few problems:

  • div_scalar in basic and decimal do not equivalent signatures: unclear how we can abstract over this
  • rem and rem_scalar are not implemented in decimal (this doesn't seem like a blocker)

This is obviously just a draft to get some feedback.

Decimal arithmetics require manipulating the DataType when doing some
operations, i.e.: changing precision/scale
Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I think we should not think of decimal yet and just implement i128 as its own type.

The future decimal type will be a wrapper around this and that type will have to figure out how to deal with arithmetic, it is not a concern of the physical i128 type. So the arithmetic can continue as is.

@@ -82,6 +82,8 @@ impl_polars_datatype!(Int8Type, Int8, i8);
impl_polars_datatype!(Int16Type, Int16, i16);
impl_polars_datatype!(Int32Type, Int32, i32);
impl_polars_datatype!(Int64Type, Int64, i64);
#[cfg(feature = "dtype-i128")]
impl_polars_datatype!(Int128Type, Unknown, i128);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unknown should be Int128Type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably, you mean Int128 (which would be DataType::Int128)?

If so, I don't think that will work because I think DataType will eventually be something like DataType::Decimal(precision, scale), right?

Copy link
Member

@ritchie46 ritchie46 Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but then I think we should make the DataType something like DataType::Decimal(Option<precision, scale>). Then we can fill in the blanks later.

@plaflamme
Copy link
Contributor Author

@ritchie46 Thanks for the comments. The issue is that NumericNative has a trait bound of NativeArithmetics here. But that's not implemented for i128 precisely because the arithmetics for i128 are not the same as the native i128 Rust type (because it also needs to track precision and scale changes). This is mentioned here.

So when we introduce i128 as a polars "primitive" type, we can no longer respect this trait bound and must already abstract over it.

Unless what you're suggesting is that we don't use i128 and use a wrapper type so we can implement NativeArithmetics for it?

@plaflamme
Copy link
Contributor Author

@ritchie46 Any additional feedback or guidance you could provide? I might have some more time over the next few days to look at this.

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make clippy happy, then we can go forward.

@@ -82,6 +82,8 @@ impl_polars_datatype!(Int8Type, Int8, i8);
impl_polars_datatype!(Int16Type, Int16, i16);
impl_polars_datatype!(Int32Type, Int32, i32);
impl_polars_datatype!(Int64Type, Int64, i64);
#[cfg(feature = "dtype-i128")]
impl_polars_datatype!(Int128Type, Unknown, i128);
Copy link
Member

@ritchie46 ritchie46 Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but then I think we should make the DataType something like DataType::Decimal(Option<precision, scale>). Then we can fill in the blanks later.

@plaflamme
Copy link
Contributor Author

Alrighty, I've fixed the warning in b803b6d and added DataType::Decimal128 in 0d0a898 PTAL

@ritchie46
Copy link
Member

Thanks @plaflamme. The next step is adding this Series. Would you want to have this merged first and then continue on the Series later?

@plaflamme
Copy link
Contributor Author

plaflamme commented Feb 1, 2023

If we can get this merged, that would be preferable, yeah. Though I'd be worried with the todo!()s introduced since those will cause panics which isn't particularly good UX!

@ritchie46
Copy link
Member

If we can get this merged, that would be preferable, yeah. Though I'd be worried with the todo!()s introduced since those will cause panics which isn't particularly good UX!

Nope, it is an incremental effort.

@ritchie46
Copy link
Member

Thanks @plaflamme. Next up a Series of Decimal 🚀 ;)

@ritchie46 ritchie46 merged commit 8ff0dcf into pola-rs:master Feb 2, 2023
@plaflamme plaflamme deleted the dtype-i128 branch February 3, 2023 05:23
@plaflamme
Copy link
Contributor Author

@ritchie46 great, thanks for merging this. Do you have any pointers to get me started on Series? I took a very brief look and it seems like there are a few things that will need attention. Would you suggest starting with the implementations module?

@ritchie46
Copy link
Member

Yes.. And you need to make newtype that is ChunkedDecimal wrapped around ChunkedArray<128>

@plaflamme
Copy link
Contributor Author

@ritchie46 am I on the right track with this?

@ritchie46
Copy link
Member

Yes! Most methods can be simply dispatched to ChunkedArray<128> from the Series implementation.

If you open a PR we could track this.

@plaflamme
Copy link
Contributor Author

@ritchie46 here's a PR: #6374

@aldanor
Copy link
Contributor

aldanor commented Feb 22, 2023

@plaflamme Thanks for your work on this! I'm very exicted in seeing the int128 support becoming full-fledged (and yet another thing to distance polars from pandas...).

@ritchie46 wonder what's the approximate roadmap to getting int128/decimals supported all the way through to the Python layer, what are the next steps from here? (I could try to contribute myself if the tasks are reasonably narrow-scoped and if it helps to speed it up)

@plaflamme
Copy link
Contributor Author

@aldanor I've started some work in this branch. I don't really know what I'm doing, I'm mostly following breadcrumbs from datetime and other types that are similar to Decimal. In any case, help on this would be appreciated, feel free to make PRs against my branch.

@ritchie46 any guidance would be appreciated

@aldanor
Copy link
Contributor

aldanor commented Feb 23, 2023

@plaflamme Maybe open a PR in your own repo and tag me? Skimming through, I might have a few random suggestions - would be easier to comment in the PR

@aldanor
Copy link
Contributor

aldanor commented Feb 23, 2023

Actually, one (perhaps weird) question to discuss before we hack too deep: could we just get rid of precision altogether? This would potentially simplify quite a lot of the code.

It won't change any computational logic or affect any casting as it's used solely for validation. Can't we just always use max-precision, i.e. 38, internally?

Off top of my head, the only thing it can visibly affect is formatting, i.e. 123.45 with (5, 2) will get printed as "123.45" whereas 123.45 with (7, 3) will get printed as 0123.450, IIRC. Well... that, plus validation errors that may pop up during runtime where your value doesn't fit declared precision – do we want that?

Another related question is whether scale (and precision if it's used) should be an Option<>? If we need an Int128Type, shouldn't it be a separate thing/datatype? Also, there's no such datatype as "int128" in arrow, only the Decimal128 itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-dtype-decimal Area: decimal data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants